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1. Introduction . 

In this; report, ue. diacusa neu results, and insights concerning an iterative 
procedure introduced in lU for obtaining maxlmumTllkellhood estimates of the 
parameters for a mixture of normal diatrlbutiona. For any (Questions concerning 
notation, definitions, etc,, the reader is referred to that report. 

The- iterative procedure In question Is the following; Beginning with, some 

-ai\ 

5-ai 


starting value 


in the space OC^ 9 J Introduced In ’^l]* define 
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successive Iterates Inductively By the relationship 


^ 1 ) j 


« 


given In [1]. It Is ahovm in [1] that, with. proBaBlllty approaching 1 as 
the sample size N approaches Infinity, this procedure converges locally- to 
the consistent maacimum-ll^elihood estimate whenever £ la sufficiently small. 

(In particular, a < n(nVi )' ^+2y the local convergence of this pro- 

cedure ih proBaBillty.) 

In this report, we prove that, in probability, the procedure converges 

locally to the consistent maximum-likelihood estimate whenever & < e < 2. Ue 
also show that the e which yields optimal local convergence rates lies Between 
1 and 2. In fact, the optimal £ la near 1 ^ If the component populations 
are widely separated, and near 2 if the component populations have nearly 
Identical means and covariance matrices. 


1. Local Convergence . 

As in Il3, we say that } is locally contractive (in a norm jj || on 




near 



£. ^ If there Is a number X , 0 < A < 1 


such- that 
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whenever 
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lies sufficiently near 
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Our result Is the following. 


Theorem . With, probability approaching 1 as' N approaches infinity, Is 

a locally contractive operator (in a norm to be defined on S /S ) near 

the consistent maxlmum-llhellhood estimate whenever (X'< e < 2. 

Corollary . With probability approaching 1 as N approaches infinity, the 
iterative procedure (*) converges locally to the consistent maximum- likelihood 
estimate whenever 0 < e < 2.. 


Proof: As observed In [1], the theorem will be proved If it can be shown that, 

for 0 < € < 2, E(V J^(5®,y®,Z*)) has operator norm less than 1 with respect 
to some vector norm on OL&'IFft- ® ^ , (Throughout this note, the superscript 
"o " indicates that the superscripted parameters are the true parameters of the 
mixture density.) For 1-1,..., m, let < , and < , >” be the Inner pro- 
ducts on r" and the space of real, symmetric n><n matrices Introduced in [1], 
1. e., let 


<v,w>j^ = ^ 

II ^“1 >r 

<A,B>^ - tr{A(— )B } for real, symmetric nXn A and B. 
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These Inner products, together with scalar multiplication on induce an 

Inner product < , > on & ^ • Now E(Vf ■ I •; e QR, 
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where 


and 
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^p dx 


One sees that the theorera will be proved if it can be shown that^wlth respect 
to some vector norm on operator norm of QR is no greater 

than 1. Since QR is positive definite and aynunetrlc with respect to the 
inner product < ,Q ^ it follows that the theorem will be proved if It can 
be shown that <V,q”^[QR]V> » <V,RV> s <V,Q"^> for V c Z • 
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For 


one has 





<V,RV> - 




R" 


J^tr{A^(^i *(*-W°l) (x-wi)’’ - II’^})^ to 

®*P 




dx 


R“ 


by Schwarz’s Inequality. If the squared expressions la the last sum above are 
written out in full, one sees that the Integrals of the cross terms in these 
expressions vanish. Consequently, 
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Now 

(1) ( “‘j'Mpi'I* - “I'M 

(2) I (vjE°“^(x-w"))^a^p^dx - J vJe""^(x-M®j_) (x-y') Vj^"^v^o®Pj^dx 

r“ R® 

- 

(3) j (tr{A^(|2^‘b [S:'i‘“^(x-Uj) (x-u;)’^-I]''})^a^Pidx - <A^,Z*"^A^ >’^ 
r" 

(A proof of (3) follows below.) From (1), (2), and (3), one concludes that 

<V,RV> - <v,q"^. 

^Is completes the proof of the theorem. 

Proof of (3) : Setting y ■ lfj”^^^(x-U^) and 

I - j‘(tr{A^(|lfJ^)lS®^'^(x-y"^)(x-Ui)^-Il'^))^«iPidx. 
r’^ 

one obtains 

I - i J - C'l"» V- 

r” 

where p^~N(0,I). Denoting - B - (b^j^) , 
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one then derives 
0 


I J" (tr{B[yy^-I]})S^dy 




-i j* [(trCflyy"^})^ - 2tr{B}tr{Byy^} + (tr{B»^]p^dy 
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3. The optimal e . 

From the proof of the theorem, on*. >^ee8 that, asymptotically as N approaches 
Infinity, the value of e which yields optimal local convergence rates is that 
which minimizes the spectral radius of ECvXgCa'' ,u',Z®)). (Indeed, 

E(V f ^ (a ,£'’)) - I “ e QR is symmetric with respect to the inner product 
< >; hence, its operator norm with respect to this inner product is equal 

to its spectral radius.) Letting p and T denote, respectively, the largest 
and smallest eigenvalues of QR, one verifies that the spectral radius of 
E(V is minimized when 1 - e T ■ ep - 1, i.e. , when e ■ 

Now p = 1 always, for it follows from the proof of the theorem that p is 



» 
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never greet^'i' than 1| and 





is always an eigenvector of QR with eigenvalue 1. Thus optimal convergence 

2 

rates are obtained when £ “ 14 T* where t lies between 0 and 1. In 
particular, the best choice of e lies between 1 and 2. 

Suppose that the component populations In the mixture are **wldely separated” 
In the sense that each pair differs greatly from every other such 

pair . Then 



for X 6 R° and l,j = l,...,m. 


and one verifies that QR s I. Consequently, optimal convergence rates are 
obtained for an e near 1 and, for the optimal £, 

E(V • I - e QR » 0. Thus for mixtures whose component populations 

are "widely separated”, optimal convergence rates are obtained for an e near 
1, and rapid first-order convergence can be expected for this e. 

Now suppose that the component populations In the mixture are such that 
each pair differs little from every other such pair. Then 
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Pl(x) n 

p(x) z p^(x) and z 1 for x c R and i - 1 and one verifies 

that the smallest eigenvalue of QR is near zero. It follows that optimal 

convergence rates are obtained for an e near 2. In this case, the spectral 

radius of E(7^ is near 1, even for the optimal value of e; 

£ 

hence, slow first-order convergence Is to be expected. 

We conclude by observing that the major practical Implication of this note 
Is that the iterative procedure under consideration converges whenever the 
step-size £ lies In an Interval which Is completely Independent of the particular 
mixture problem at hand. It Is readily ascertained that this cannot be ^^ald for 
the regular steepest descent procedure 
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Thus the procedute considered here offers considerable practical advantages over 
the steepest descent procedure, even though It Is Itself a generalized steepest 
descent (deflected gradient) procedure. 
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